Basic R Commands

The commands below

  1. create vectors x and y

  2. Returning the gtheir length

  3. sum them together

  4. checking the objects present in the environment

  5. removing x and y from the environment

  6. create a 2X2 matrix with numbers from 1 to 4

  7. find the square root and the square of each matrix’s component

  8. Drawing random numbers from the Normal distribution

  9. Compute summary statistics such as sample mean, variance, standard deviation and correlation

x <- c(1,3,2,5)
x
## [1] 1 3 2 5
y = c(1,4,3)
y
## [1] 1 4 3
length(x)
## [1] 4
length(y)
## [1] 3
x+y
## Warning in x + y: longer object length is not a multiple of shorter object
## length
## [1] 2 7 5 6
ls()
## [1] "x" "y"
rm(x,y)
ls()
## character(0)
rm(list=ls())
?matrix
x=matrix(data=c(1,2,3,4), nrow=2, ncol=2)
x
##      [,1] [,2]
## [1,]    1    3
## [2,]    2    4
x=matrix(c(1,2,3,4),2,2)
matrix(c(1,2,3,4),2,2,byrow=TRUE)
##      [,1] [,2]
## [1,]    1    2
## [2,]    3    4
sqrt(x)
##          [,1]     [,2]
## [1,] 1.000000 1.732051
## [2,] 1.414214 2.000000
x^2
##      [,1] [,2]
## [1,]    1    9
## [2,]    4   16
x=rnorm(50)
y=x+rnorm(50,mean=50,sd=.1)
cor(x,y)
## [1] 0.9953366
set.seed(1303)
rnorm(50)
##  [1] -1.1439763145  1.3421293656  2.1853904757  0.5363925179  0.0631929665
##  [6]  0.5022344825 -0.0004167247  0.5658198405 -0.5725226890 -1.1102250073
## [11] -0.0486871234 -0.6956562176  0.8289174803  0.2066528551 -0.2356745091
## [16] -0.5563104914 -0.3647543571  0.8623550343 -0.6307715354  0.3136021252
## [21] -0.9314953177  0.8238676185  0.5233707021  0.7069214120  0.4202043256
## [26] -0.2690521547 -1.5103172999 -0.6902124766 -0.1434719524 -1.0135274099
## [31]  1.5732737361  0.0127465055  0.8726470499  0.4220661905 -0.0188157917
## [36]  2.6157489689 -0.6931401748 -0.2663217810 -0.7206364412  1.3677342065
## [41]  0.2640073322  0.6321868074 -1.3306509858  0.0268888182  1.0406363208
## [46]  1.3120237985 -0.0300020767 -0.2500257125  0.0234144857  1.6598706557
set.seed(3)
y=rnorm(100)
mean(y)
## [1] 0.01103557
var(y)
## [1] 0.7328675
sqrt(var(y))
## [1] 0.8560768
sd(y)
## [1] 0.8560768

Activity 1

Draw 10 numbers from the normal distribution with mean 5 and variance 2. Store them in the vector x and calculate the sample mean and standard deviation. Put them in a 2 X 5 matrix by row and and raise each number to the power of three.

Graphics

The commands below show how to: 1. Do a scatter plot with the command plot and save it to a pdf file. 2. Create sequences and store them into vectors 3. Do a contour plot with the command contour. This requires three arguments: a vector of the x values, the y values and a matrix whose values correspond to each (x,y) pair (coordinates). Also do a heatmap with the command image. 4. Produce a 3-d plot for the above with the commande persp. The two arguments theta and phi control the angles at which the plot is viewed.

x=rnorm(100)
y=rnorm(100)
plot(x,y)

plot(x,y,xlab="this is the x-axis",ylab="this is the y-axis",main="Plot of X vs Y")

pdf("Figure.pdf")
plot(x,y,col="green")
dev.off()
## quartz_off_screen 
##                 2
x=seq(1,10)
x
##  [1]  1  2  3  4  5  6  7  8  9 10
x=1:10
x
##  [1]  1  2  3  4  5  6  7  8  9 10
x=seq(-pi,pi,length=50)
y=x
f=outer(x,y,function(x,y)cos(y)/(1+x^2))
contour(x,y,f)
contour(x,y,f,nlevels=45,add=T)

fa=(f-t(f))/2
contour(x,y,fa,nlevels=15)

image(x,y,fa)

persp(x,y,fa)

persp(x,y,fa,theta=30)

persp(x,y,fa,theta=30,phi=20)

persp(x,y,fa,theta=30,phi=70)

persp(x,y,fa,theta=30,phi=40)

Activity 2

Construct x and y to be sequences of length 100 from -5 to 5. Then 1. Set f1 to be 2 times x and do a scatter plot of x and f1. 2. Set f2 to be 2 times x minus 3 times y and produce a contour plot, a heatmap and 3d plot of f2 vs x and y

Indexing Data

Below are some commands for practicing in working with matrices and accessing their elements

A=matrix(1:16,4,4)
A
##      [,1] [,2] [,3] [,4]
## [1,]    1    5    9   13
## [2,]    2    6   10   14
## [3,]    3    7   11   15
## [4,]    4    8   12   16
A[2,3]
## [1] 10
A[c(1,3),c(2,4)]
##      [,1] [,2]
## [1,]    5   13
## [2,]    7   15
A[1:3,2:4]
##      [,1] [,2] [,3]
## [1,]    5    9   13
## [2,]    6   10   14
## [3,]    7   11   15
A[1:2,]
##      [,1] [,2] [,3] [,4]
## [1,]    1    5    9   13
## [2,]    2    6   10   14
A[,1:2]
##      [,1] [,2]
## [1,]    1    5
## [2,]    2    6
## [3,]    3    7
## [4,]    4    8
A[1,]
## [1]  1  5  9 13
A[-c(1,3),]
##      [,1] [,2] [,3] [,4]
## [1,]    2    6   10   14
## [2,]    4    8   12   16
A[-c(1,3),-c(1,3,4)]
## [1] 6 8
dim(A)
## [1] 4 4

Loading Data

Below are some commands for loading data from an existing file.

Auto=read.table("Auto.data.txt")
dim(Auto)
## [1] 398   9
names(Auto)
## [1] "V1" "V2" "V3" "V4" "V5" "V6" "V7" "V8" "V9"
Auto=read.table("Auto.data.txt",header=T,na.strings="?")
dim(Auto)
## [1] 397   9
names(Auto)
## [1] "mpg"          "cylinders"    "displacement" "horsepower"  
## [5] "weight"       "acceleration" "year"         "origin"      
## [9] "name"
Auto=read.csv("Auto.csv",header=T,na.strings="?")
dim(Auto)
## [1] 397   9
names(Auto)
## [1] "mpg"          "cylinders"    "displacement" "horsepower"  
## [5] "weight"       "acceleration" "year"         "origin"      
## [9] "name"
Auto[1:4,]
##   mpg cylinders displacement horsepower weight acceleration year origin
## 1  18         8          307        130   3504         12.0   70      1
## 2  15         8          350        165   3693         11.5   70      1
## 3  18         8          318        150   3436         11.0   70      1
## 4  16         8          304        150   3433         12.0   70      1
##                        name
## 1 chevrolet chevelle malibu
## 2         buick skylark 320
## 3        plymouth satellite
## 4             amc rebel sst
Auto=na.omit(Auto)
dim(Auto)
## [1] 392   9
names(Auto)
## [1] "mpg"          "cylinders"    "displacement" "horsepower"  
## [5] "weight"       "acceleration" "year"         "origin"      
## [9] "name"

Additional Graphical and Numerical Summaries

Below are some commands to do exploratory analysis on the Auto dataset

Auto=read.csv("Auto.csv",header=T,na.strings="?")
plot(Auto$cylinders, Auto$mpg)

attach(Auto)
plot(cylinders, mpg)

cylinders=as.factor(cylinders)
plot(cylinders, mpg)

plot(cylinders, mpg, col="red")

plot(cylinders, mpg, col="red", varwidth=T)

plot(cylinders, mpg, col="red", varwidth=T,horizontal=T)

plot(cylinders, mpg, col="red", varwidth=T, xlab="cylinders", ylab="MPG")

hist(mpg)

hist(mpg,col=2)

hist(mpg,col=2,breaks=15)

pairs(Auto)

pairs(~ mpg + displacement + horsepower + weight + acceleration, Auto)

plot(horsepower,mpg)
identify(horsepower,mpg,name)

## integer(0)
summary(Auto)
##       mpg          cylinders      displacement     horsepower   
##  Min.   : 9.00   Min.   :3.000   Min.   : 68.0   Min.   : 46.0  
##  1st Qu.:17.50   1st Qu.:4.000   1st Qu.:104.0   1st Qu.: 75.0  
##  Median :23.00   Median :4.000   Median :146.0   Median : 93.5  
##  Mean   :23.52   Mean   :5.458   Mean   :193.5   Mean   :104.5  
##  3rd Qu.:29.00   3rd Qu.:8.000   3rd Qu.:262.0   3rd Qu.:126.0  
##  Max.   :46.60   Max.   :8.000   Max.   :455.0   Max.   :230.0  
##                                                  NA's   :5      
##      weight      acceleration        year           origin     
##  Min.   :1613   Min.   : 8.00   Min.   :70.00   Min.   :1.000  
##  1st Qu.:2223   1st Qu.:13.80   1st Qu.:73.00   1st Qu.:1.000  
##  Median :2800   Median :15.50   Median :76.00   Median :1.000  
##  Mean   :2970   Mean   :15.56   Mean   :75.99   Mean   :1.574  
##  3rd Qu.:3609   3rd Qu.:17.10   3rd Qu.:79.00   3rd Qu.:2.000  
##  Max.   :5140   Max.   :24.80   Max.   :82.00   Max.   :3.000  
##                                                                
##              name    
##  ford pinto    :  6  
##  amc matador   :  5  
##  ford maverick :  5  
##  toyota corolla:  5  
##  amc gremlin   :  4  
##  amc hornet    :  4  
##  (Other)       :368
summary(mpg)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##    9.00   17.50   23.00   23.52   29.00   46.60

Activity 3

This exercise involves the Boston housing data set which is part of the MASS library in R. You can read about it typing ?Boston after loading the library. To load and view part of it use the following commands:

library(MASS)
head(Boston)
##      crim zn indus chas   nox    rm  age    dis rad tax ptratio  black
## 1 0.00632 18  2.31    0 0.538 6.575 65.2 4.0900   1 296    15.3 396.90
## 2 0.02731  0  7.07    0 0.469 6.421 78.9 4.9671   2 242    17.8 396.90
## 3 0.02729  0  7.07    0 0.469 7.185 61.1 4.9671   2 242    17.8 392.83
## 4 0.03237  0  2.18    0 0.458 6.998 45.8 6.0622   3 222    18.7 394.63
## 5 0.06905  0  2.18    0 0.458 7.147 54.2 6.0622   3 222    18.7 396.90
## 6 0.02985  0  2.18    0 0.458 6.430 58.7 6.0622   3 222    18.7 394.12
##   lstat medv
## 1  4.98 24.0
## 2  9.14 21.6
## 3  4.03 34.7
## 4  2.94 33.4
## 5  5.33 36.2
## 6  5.21 28.7
  1. How many rows are in this data set? How many columns? What do the rows and columns represent?

  2. Make some pairwise scatterplots of the predictors (columns) in this data set. Describe your findings.

  3. Are any of the predictors associated with per capita crime rate? If so, explain the relationship.

  4. Do any of the suburbs of Boston appear to have particularly high crime rates? Tax rates? Pupil-teacher ratios? Comment on the range of each predictor.

  5. How many of the suburbs in this data set bound the Charles river?

  6. What is the median pupil-teacher ratio among the towns in this data set?

  7. Which suburb of Boston has lowest median value of owner- occupied homes? What are the values of the other predictors for that suburb, and how do those values compare to the overall ranges for those predictors? Comment on your findings.

  8. In this data set, how many of the suburbs average more than seven rooms per dwelling? More than eight rooms per dwelling? Comment on the suburbs that average more than eight rooms per dwelling.